NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Liu, Xianpeng; Zheng, Ce; Qian, Ming; Xue, Nan; Chen, Chen; Zhang, Zhebin; Li, Chen; Wu, Tianfu (August 2024, IEEE CVPR)

This paper presents Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D- to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection, prior art often suffers from either the lack of exploiting high- resolution 2D features in dense attention-based lifting, due to high computational costs, or from insufficiently dense grounding of 3D queries to multi-scale 2D features in sparse attention-based lifting. Our proposed MvACon hits the two birds with one stone using a representationally dense yet computationally sparse attentive feature contextualization scheme that is agnostic to specific 2D-to-3D feature lifting approaches. In experiments, the proposed MvACon is thoroughly tested on the nuScenes benchmark, using both the BEVFormer and its recent 3D deformable attention (DFA3D) variant, as well as the PETR, showing consistent detection performance improvement, especially in enhancing performance in location, orientation, and velocity prediction. It is also tested on the Waymo-mini benchmark using BEVFormer with similar improvement. We qualitatively and quantitatively show that global cluster-based contexts effectively encode dense scene-level contexts for MV3D object detection. The promising results of our proposed MvACon reinforces the adage in computer vision – “(contextualized) feature matters”.
more » « less
Full Text Available
NEAT: Distilling 3D Wireframes from Neural Attraction Fields

Xue, Nan; Tan, Bin; Xiao, Yuxi; Dong, Liang; Xia, Gui-Song; Wu, Tianfu; Shen, Yujun (June 2024, IEEE CVPR)

• This paper studies the problem of structured 3D reconstruction using wireframes that consist of line segments and junctions, focusing on the computation of structured boundary geometries of scenes. Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and dis- tilling of a sparse set of 3D global junctions. The proposed NEAT enjoys the joint optimization of the neural fields and the global junctions from scratch, using view-dependent 2D observations without precomputed cross-view feature matching. Comprehensive experiments on the DTU and BlendedMVS datasets demonstrate our NEAT’s superiority over state-of-the-art alternatives for 3D wireframe reconstruction. Moreover, the distilled 3D global junctions by NEAT, are a better initialization than SfM points, for the recently-emerged 3D Gaussian Splatting for high-fidelity novel view synthesis using about 20 times fewer initial 3D points.
more » « less
Full Text Available
Elastomers Fail from the Edge

https://doi.org/10.1103/PhysRevX.14.011054

Xue, Nan; Long, Rong; Dufresne, Eric R; Style, Robert W (March 2024, Physical Review X)

Full Text Available
Highly elastic fibers in a shear flow can form double helices

https://doi.org/10.1088/1367-2630/ad56c0

Słowicka, Agnieszka M; Xue, Nan; Liu, Lujia; Nunes, Janine K; Sznajder, Paweł; Stone, Howard A; Ekiel-Jeżewska, Maria L (July 2024, New Journal of Physics)

Abstract The long-time behavior of highly elastic fibers in a shear flow is investigated experimentally and numerically. Characteristic attractors of the dynamics are found. It is shown that for a small ratio of bending to hydrodynamic forces, most fibers form a spinning elongated double helix, performing an effective Jeffery orbit very close to the vorticity direction. Recognition of these oriented shapes, and how they form in time, may prove useful in the future for understanding the time history of complex microstructures in fluid flows and considering processing steps for their synthesis.
more » « less
Full Text Available
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

Liu, Xianpeng; Zheng, Ce; Cheng, Kelvin; Xue, Nan; Qi, Goo-Jun; Wu, Tianfu (October 2023, The Computer Vision Foundation (CVF))

The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a topdown manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model [20] to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.
more » « less
Full Text Available
Level-S ² fM: Structure from Motion on Neural Level Set of Implicit Surfaces

https://doi.org/10.1109/CVPR52729.2023.01650

Xiao, Yuxi; Xue, Nan; Wu, Tianfu; Xia, Gui-Song (June 2023, IEEE)

This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM, which estimates the camera poses and scene geometry from a set of uncalibrated images by learning coordinate MLPs for the implicit surfaces and the radiance fields from the established key-point correspondences. Our novel formulation poses some new challenges due to inevitable two-view and few-view configurations in the incremental SfM pipeline, which complicates the optimization of coordinate MLPs for volumetric neural rendering with unknown camera poses. Nevertheless, we demonstrate that the strong inductive basis conveying in the 2D correspondences is promising to tackle those challenges by exploiting the relationship between the ray sampling schemes. Based on this, we revisit the pipeline of incremental SfM and renew the key components, including two-view geometry initialization, the camera poses registration, the 3D points triangulation, and Bundle Adjustment, with a fresh perspective based on neural implicit surfaces. By unifying the scene geometry in small MLP networks through coordinate MLPs, our Level-S2fM treats the zero-level set of the implicit surface as an informative top-down regularization to manage the reconstructed 3D points, reject the outliers in correspondences via querying SDF, and refine the estimated geometries by NBA (Neural BA). Not only does our Level-S2fM lead to promising results on camera pose estimation and scene geometry reconstruction, but it also shows a promising way for neural implicit rendering without knowing camera extrinsic beforehand.
more » « less
Full Text Available
NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D Reconstruction

https://doi.org/10.1109/TPAMI.2023.3314745

Tan, Bin; Xue, Nan; Wu, Tianfu; Xia, Gui-Song (January 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence)

This paper studies the challenging two-view 3D reconstruction problem in a rigorous sparse-view configuration, which is suffering from insufficient correspondences in the input image pairs for camera pose estimation. We present a novel Neural One-PlanE RANSAC framework (termed NOPE-SAC in short) that exerts excellent capability of neural networks to learn one-plane pose hypotheses from 3D plane correspondences. Building on the top of a Siamese network for plane detection, our NOPE-SAC first generates putative plane correspondences with a coarse initial pose. It then feeds the learned 3D plane correspondences into shared MLPs to estimate the one-plane camera pose hypotheses, which are subsequently reweighed in a RANSAC manner to obtain the final camera pose. Because the neural one-plane pose minimizes the number of plane correspondences for adaptive pose hypotheses generation, it enables stable pose voting and reliable pose refinement with a few of plane correspondences for the sparse-view inputs. In the experiments, we demonstrate that our NOPE-SAC significantly improves the camera pose estimation for the two-view inputs with severe viewpoint changes, setting several new state-of-the-art performances on two challenging benchmarks, i.e., MatterPort3D and ScanNet, for sparse-view 3D reconstruction. The source code is released at https://github.com/IceTTTb/NopeSAC for reproducible research.
more » « less
Full Text Available
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation

Xue, Nan and (June 2022, IEEE/CVF CVPR)

This paper studies the problem of multi-person pose estimation in a bottom-up fashion. With a new and strong observation that the localization issue of the center-offset formulation can be remedied in a local-window search scheme in an ideal situation, we propose a multi-person pose estimation approach, dubbed as LOGO-CAP, by learning the LOcal-GlObal Contextual Adaptation for human Pose. Specifically, our approach learns the keypoint attraction maps (KAMs) from the local keypoints expansion maps (KEMs) in small local windows in the first step, which are subsequently treated as dynamic convolutional kernels on the keypoints-focused global heatmaps for contextual adaptation, achieving accurate multi-person pose estimation. Our method is end-to-end trainable with near real-time inference speed in a single forward pass, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our method also outperforms prior arts by a large margin on the challenging OCHuman dataset.
more » « less
Full Text Available
Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

https://doi.org/10.1109/TPAMI.2023.3312749

Xue, Nan; Wu, Tianfu; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Zhang, Liangpei; Torr, Philip H.S. (January 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence)

This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT) field representation that encodes line segments using a closed-form 4D geometric vector field. The proposed HAWP consists of three sequential components empowered by end-to-end and HAT-driven designs: (1) generating a dense set of line segments from HAT fields and endpoint proposals from heatmaps, (2) binding the dense line segments to sparse endpoint proposals to produce initial wireframes, and (3) filtering false positive proposals through a novel endpoint-decoupled line-of-interest aligning (EPD LOIAlign) module that captures the co-occurrence between endpoint proposals and HAT fields for better verification. Thanks to our novel designs, HAWPv2 shows strong performance in fully supervised learning, while HAWPv3 excels in self-supervised learning, achieving superior repeatability scores and efficient training (24 GPU hours on a single GPU). Furthermore, HAWPv3 exhibits a promising potential for wireframe parsing in out-of-distribution images without providing ground truth labels of wireframes.
more » « less
Full Text Available
Shear-induced migration of confined flexible fibers

https://doi.org/10.1039/d1sm01256h

Xue, Nan; Nunes, Janine K.; Stone, Howard A. (January 2022, Soft Matter)

We report an experimental study of the shear-induced migration of flexible fibers in suspensions confined between two parallel plates. Non-Brownian fiber suspensions are imaged in a rheo-microscopy setup, where the top and the bottom plates counter-rotate and create a Couette flow. Initially, the fibers are near the bottom plate due to sedimentation. Under shear, the fibers move with the flow and migrate towards the center plane between the two walls. Statistical properties of the fibers, such as the mean values of the positions, orientations, and end-to-end lengths of the fibers, are used to characterize the behaviors of the fibers. A dimensionless parameter Λ eff , which compares the hydrodynamic shear stress and the fiber stiffness, is used to analyze the effective flexibility of the fibers. The observations show that the fibers that are more likely to bend exhibit faster migration. As Λ eff increases (softer fibers and stronger shear stresses), the fibers tend to align in the flow direction and the motions of the fibers transition from tumbling and rolling to bending. The bending fibers drift away from the walls to the center plane. Further increasing Λ eff leads to more coiled fiber shapes, and the bending is more frequent and with larger magnitudes, which leads to more rapid migration towards the center. Different behaviors of the fibers are quantified with Λ eff , and the structures and the dynamics of the fibers are correlated with the migration.
more » « less
Full Text Available

« Prev Next »

Search for: All records